Analyzing Multi-locus Plant Barcoding Datasets with a Composition Vector Method Based on Adjustable Weighted Distance
نویسندگان
چکیده
BACKGROUND The composition vector (CV) method has been proved to be a reliable and fast alignment-free method to analyze large COI barcoding data. In this study, we modify this method for analyzing multi-gene datasets for plant DNA barcoding. The modified method includes an adjustable-weighted algorithm for the vector distance according to the ratio in sequence length of the candidate genes for each pair of taxa. METHODOLOGY/PRINCIPAL FINDINGS Three datasets, matK+rbcL dataset with 2,083 sequences, matK+rbcL dataset with 397 sequences and matK+rbcL+trnH-psbA dataset with 397 sequences, were tested. We showed that the success rates of grouping sequences at the genus/species level based on this modified CV approach are always higher than those based on the traditional K2P/NJ method. For the matK+rbcL datasets, the modified CV approach outperformed the K2P-NJ approach by 7.9% in both the 2,083-sequence and 397-sequence datasets, and for the matK+rbcL+trnH-psbA dataset, the CV approach outperformed the traditional approach by 16.7%. CONCLUSIONS We conclude that the modified CV approach is an efficient method for analyzing large multi-gene datasets for plant DNA barcoding. Source code, implemented in C++ and supported on MS Windows, is freely available for download at http://math.xtu.edu.cn/myphp/math/research/source/Barcode_source_codes.zip.
منابع مشابه
Ribosomal RNA as molecular barcodes: a simple correlation analysis without sequence alignment
MOTIVATION We explored the feasibility of using unaligned rRNA gene sequences as DNA barcodes, based on correlation analysis of composition vectors (CVs) derived from nucleotide strings. We tested this method with seven rRNA (including 12, 16, 18, 26 and 28S) datasets from a wide variety of organisms (from archaea to tetrapods) at taxonomic levels ranging from class to species. RESULT Our res...
متن کاملA novel three-stage distance-based consensus ranking method
In this study, we propose a three-stage weighted sum method for identifying the group ranks of alternatives. In the first stage, a rank matrix, similar to the cross-efficiency matrix, is obtained by computing the individual rank position of each alternative based on importance weights. In the second stage, a secondary goal is defined to limit the vector of weights since the vector of weights ob...
متن کاملFeasibility of nuclear ribosomal region ITS1 over ITS2 in barcoding taxonomically challenging genera of subtribe Cassiinae (Fabaceae)
PREMISE OF THE STUDY The internal transcribed spacer (ITS) region is situated between 18S and 26S in a polycistronic rRNA precursor transcript. It had been proved to be the most commonly sequenced region across plant species to resolve phylogenetic relationships ranging from shallow to deep taxonomic levels. Despite several taxonomical revisions in Cassiinae, a stable phylogeny remains elusive ...
متن کاملRapid dissemination of taxonomic discoveries based on DNA barcoding and morphology
The taxonomic impediment is characterized by dwindling classical taxonomic expertise, and slow pace of revisionary work, thus more rapid taxonomic assessments are needed. Here we pair rapid DNA barcoding methods with swift assessment of morphology in an effort to gauge diversity, establish species limits, and rapidly disseminate taxonomic information prior to completion of formal taxonomic revi...
متن کاملFeature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine
Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods. In filter methods, features subsets are selected due to some measu...
متن کامل